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ABSTRACT 


This paper discusses a novel approach for developing more 
refined and accurate learner models from student data col- 
lected from Open Ended Learning Environments (OELEs). 
OELEs provide students choice in how they go about con- 
structing solutions to problems, and students exhibit a va- 
riety of learning behaviors in such environments. Building 
accurate models from limited amount of student data is dif- 
ficult; to address this we develop a methodology that uses 
Monte Carlo Tree Search methods to boost the initial set of 
student action sequences in such a way that we can learn 
more accurate models of students’ learning behaviors. We 
use a HMM representation to model students’ learning beha- 
viors and demonstrate the effectiveness of our approach by 
running a case study on data collected from 98 students, who 
worked with the Betty’s Brain system for four days. The re- 
sults have interesting implications for learner modeling and 
its applications to adaptive scaffolding of students’ learning 
behaviors and strategies as they learn from OELEs. 


1. INTRODUCTION 


In recent work on computer-based STEM learning environ- 
ments, there has been a focus on developing OELEs, which 
provide students with a learning goal, usually in the form of 
a complex problem or a modeling task, and a set of tools that 
support the problem-solving/modeling task [1]. To succeed, 
these students need to make choices on how to structure the 
solution process, explore alternative solution paths, develop 
awareness of their own knowledge and problem-solving skills, 
and develop strategies that support more effective learning 
and problem solving [2]. 


Given the complexities students face in working with OE- 
LEs, it is imperative that effective scaffolding be provided 
to help them progress in their learning and problem solving 
tasks and achieve their learning goals. However, an impor- 
tant component of effective scaffolding is learner modeling 
that can accurately capture students’ cognitive and meta- 


Proceedings of the 10th International Conference on Educational Data Mining 


Gautam Biswas 
Institute for Software Integrated Systems 
Vanderbilt University 
Nashville, U.S. 
gautam.biswas@vanderbilt.edu 


cognitive processes. In this work, we take on the challenge 
of using data-driven techniques to construct accurate mo- 
dels of learner behaviors and performance by analyzing the 
learners’ activity data from OELEs. 


Typically, data-driven methods require large volumes of rich 
data to support accurate and robust learner modeling. Ho- 
wever, collecting such data from OELEs, especially in K-12 
settings can be a difficult, time consuming process. To allevi- 
ate this problem, we propose a novel set of techniques that 
combine the use of Hidden Markov Modeling (HMM) [7], 
Monte Carlo Tree Search (MCTS) [3], and a reinforcement 
learning methodology [4] to generate artificial student acti- 
vity data that simulates students behavior corresponding to 
learning activities captured in the log data. The original 
student data combined with the artificially generated data 
is then used to derive more accurate and complete models 
of students’ behaviors and strategies used for learning. 


In section 2, we briefly review the Betty’s Brain OELE that 
we use for this work, and describe the overall learner mo- 
deling approach as well as the two more important techni- 
ques that we employ, i.e., HMMs and MCTS. Section 3 pro- 
vides experimental results and evaluations of our learner mo- 
deling method by comparing analysis results of original data 
with data generated post-reinforcement learning. Section 4 
presents the discussion and conclusions. 


2. BACKGROUND 

We implement the learner modeling methods starting from 
data collected from student work in the Betty’s Brain OELE. 
Betty’s Brain is a learning by teaching environment, where 
students utilize tools for information acquisition, solution 
construction and solution assessment to teach a virtual cha- 
racter named Betty by constructing a causal map [5]. The 
primary student actions in the Betty’s Brain environment 
can be categorized as: 


Information Acquisition (IA): It relates to actions, such 
as reading to learn new information (read) and searching for 
specific knowledge search. Taking and viewing notes is also 
considered to be useful for information acquisition (notes). 


Solution Construction (SC): In Betty’s Brain, SC acti- 
ons are causal map editing actions (mapedit), which in- 
clude addition and deletion of concepts and adding,deleting 
or changing links in the causal map. 
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Solution Assessment (SA): It consists of asking Betty to 
take a quiz(quiz); answer questions (query); and to explain 
how she derived her answers using qualitative reasoning met- 
hods (expl). Besides, students can mark correctness of links 
that have been added to assist their solution assessment. 


Students’ performance is based on a map score that is com- 
puted by comparing their causal models with a pre-specified 
expert model. In our study, the expert model had 15 links, 
which implies that the students could achieve a max map 
score of 15. At any time, the students’ map score is com- 
puted by number of correct links minus number of incorrect 
links in their constructed (partial) maps. Next, we describe 
the learner modeling approach applied to Betty’s Brain. 


2.1 General Approach 

Figure 1 illustrates the general approach that we have de- 
veloped for our learner modeling method. As a first step, 
we apply a HMM clustering method [6] that divides the stu- 
dent’ behaviors into groups of similar behaviors. We then 
iteratively generate a more accurate HMM model for each 
group by running a MCTS algorithm that combined with 
a reinforcement learning approach to produces a number of 
additional student behavior sequences that provides more 
coverage of the students’ learning behaviors. These additio- 
nal sequences when combined with the original student data 
is used to learn a new HMM model that we believe is a more 
complete description of the students’ learning behaviors. 
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Figure 1: Architecture of the Overall Approach 
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2.2 HMM applied to Learner Modeling 

A HMM is defined as a tuple, ie., X = {A,B,7}, where 
A and B represent state transition probability distribution 
and emission probability distribution matrices, respectively, 
while z is the initial state probability distribution [7]. Fi- 
gure 2 presents the state diagram of a simple HMM exam- 
ple trained on two action sequences S; and S2 with only 4 
action types. Although not explicitly shown in the action 
sequences, the hidden states hi and h2 can be interpreted as 
IA state (searching for and reading resources) and SC state 
(editing concept entities and causal links) respectively. 


Based on the different probability distribution for each ob- 
servation (action), the hidden states can be labeled by the 
primary actions associated with that state. The transitions 
between states capture changes in student activities over 
time, as also frequent patterns of activities, e.g., frequent 
occurrence of information acquisition followed by solution 
construction patterns. 


2.3 Reinforcement Learning using MCTS 
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1, = 100% 33% |. = 0% 
p(read) = 67% p(concedit) = 50% 
p(search) = 33% 0% p(linkedit) = 50% 


S,: search; read; search; read; concedit; linkedit 
Sz: read; read; concedit; linkedit 


Figure 2: Simple HMM example. 


To learn accurate and robust HMMs, it is important that the 
data set cover the range of behaviors a student exhibits in 
sufficiently large numbers.. However, given that we have li- 
mited student activity data on the system, we suffer from the 
data impoverishment problem. To address this problem, we 
propose a novel reinforcement learning method using Monte 
Carlo Tree Search (MCTS) and combine it with an initially 
derived HMM model to generate artificial data that matches 
students’ learning behaviors. For generating action sequen- 
ces that simulate actual students’ behavior, we build the 
MCTS tree and traverse it to iteratively pick the next best 
node (with highest number of simulations) as the new action 
and add it to the tail of the sequence. In the reinforcement 
learning process as illustrated in Figure 1, we repeatedly ge- 
nerate simulated action sequences that maximize a specified 
reward function, and add them to the previously generated 
data. The reinforced data set is used to construct a refined 
version of the HMMs. 


MCTS performs an iterative search with each iteration con- 
sists of 4 steps, i.e., Selection, Expansion, Simulation 
and Backpropagation [3]. In most MCTS implementati- 
ons, the Upper Confidence bounds applied to Trees (UCT) 
algorithm is applied as the reward function for Selection: 


(1) 


where n; is the number of simulations performed after ad- 
ding the ith action; c is the exploration parameter with a 
typically chosen empirical value of 2; t is the total num- 
ber of simulation runs for the parent node, which is equal 
to the sum of all the n;; w; is the sum of wins (1’s) for all 
simulations after adding the ith action. 


We adopt a similar reward function and compute the w; va- 
lue for generating action sequences that form a Reinforced 
scaffolding model. In this model, the normalized simulation 
results in the range of lowest-to-highest performance mea- 
sure are summed up to compute w;. For example, an action 
sequence has w; = 1 when it achieves the max map score 
(ie., 15) in Betty’s Brain. This allows MCTS to better 
utilize coherence relations [8] to generate action sequences 
with more effective SC actions. The resulting HMM will 
favor the use of more coherent actions and be able to cap- 
ture evolvement of learning behaviors/strategies that lead to 
better learning performance. Such behavioral and strategic 
evolvements can provide the basis for adaptive scaffolding. 


We use the HMM to constrain the Expansion and Simu- 


303 


(65% 


oN 76% 
Le 


C_)s9% 
(taste, = 100% 


(\ bait, =77% 


(\78% 
Je 
fait, = 52% 


C Ss 26% 


hy: = 31% 


hg: tn = 0% ( 


TA: 5A% 
p(read) = 46% 
p(search) = 2% 
p(notes) = 6% 

SC: 32% 
p(napedit) = 30% 
p(clmark) = 2% 


1A: 80% = 
p(read) = 57% 
p(search) = 7% 
Plnotes) [16% 
SC: 20% 


(31% 
hy: %, = 23% 
+. 39 


p(notes: 
SC: 48% 
p(mapedit) = 45% 
p(clmark) = 3% 

4 SA: 5% 

se/| plquiz) = 3% 

ae) p(query) = 1% 
plexpl) = 1% 


SA: 14% 
(quiz) = 7% 
(query) = 1% 
plexpl) = 6% 


IA: 13% 
p(read) = 12% 
p(notes) = 1% 


1A: 36% 
p(read) = 36% 


IA: 44% 
p(read) = 41% 
SE BANE T p(search) = 1% 
p(mapedit) = 58% 


p(mapedit) = 52% 
p(clmark) = 13% 


ee a Ries SA:6% 
SA: 22% ptquiz) S96 
p(quiz) = 16% 
p(query) = 2% 
plexpl) = 4% 


(quiz) = 10% 
(query) = 1% 
plexpl) = 1% 


Jon 
/f 28%| | 18% 
(lait, = 0% x hg: 3 = 0% 


IA:9% IA: 27% 
p(read) = 6% (read) = 15% 
p(notes) = 3% (notes) = 12% 
SC: 9% 7% SC: 36% 


pOnapedit) = 6% 
p(elmark) = 3% 
SA:82% 

p(quiz) = 33% 
p(query) = 2% 
plexpl) = 47% 


p(mapedit) = 21% 
Sssa5e p(clmark) = 15% 
SA:37% 

(quiz) = 26% 
(query) = 6% 


plexpl) = 5% 
34% 
(a) 


HMM for cluster 1 (22 students) 


tu _)23% 


p(read) = 3% 
(notes) = 2% 

SC: A% 
p(mapedit) = 4% 
SA: 91% 
(quiz) = 22% 
p(query) = 1% 


plexpl) = 68% 


HMM for cluster 2 (28 students) 


27%| foam 
ay 


hai, = 18% 
IA:32% 
p(read) = 29% 
p(notes) = 3% 
SC:33% 
p(mapedit) = 32% 
p(clmark) = 1% 


IA:3% 
p(read) = 3% 


% 
p(mapedit) = 6% 
p(clmark) = 6% 


p(quiz) 
P(query) =1% 
plexpl) = 53% 


SA:35% 
p(quiz) = 26% 
p(query) = 2% 
p(expl) = 7% 


©) 
HMM for cluster 3 (48 students) 


( 

AC 
5 
8 


Figure 3: HMMs for the three clusters 


Table 1: Comparison of the Three Clusters 


IA SA Balanced Balanced Search & Note | Better strategic S a 

state | state | IA&SC state | SC&SA state Actions rate state transitions 7 is 

Cluster 1 hy ha he hg High Yes 6.22 | 7.5 
Cluster 2 hy hg - - Low No 2.85 | -2.25 
Cluster 3 - ha he hg Low Yes 5.61 | 3.79 


lation steps to prevent expanding unvisited nodes and as- 
sociated actions that are are not likely to occur in a given 
state. With these simulation and expansion policies, we can 
always generate action sequences that fit the HMM within 
a specified variance range. Figure 4 shows a simple example 
of generating artificial action sequence by applying MCTS. 


Current sequence: search Current sequence: search, read 


uv Pick read as v Pick mapedit as 
MCTS next action. MCTS next action. 
search search;read 
i i a“ ™ 
ge occta wae cae 
read mapedit guts road mapedit _ 
n= 185 n,=10 n= 5 n,=90  n,= 105 n= 5 


Figure 4: Simple example of applying MCTS for 
generating action sequence. n, is the number of si- 
mulations performed during MCTS. 


3. EXPERIMENTS AND ANALYSIS 

We use data from a Betty’ Brain study run with 98 6th 
grade middle school students in a science classroom for our 
experiments. A HMM clustering algorithm [6] is applied to 
discover groups of action sequences with high within-cluster 
homogeneities. This algorithm produced 3 clusters with the 
highest Partition Mutual Information value. HMMs for the 
three clusters are represented by the state diagrams shown 
in Figure 3, where h; represents the ith hidden state with 
corresponding initial probability 7;. State transition pro- 
babilities are marked on the transition links while emission 
probability of an action a in a state diagram is given by 
p(a). For measuring students performance in the different 
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clusters, we denote the average pre- and post-test score gain 
as So and denote the average final causal map score of the 
group as Sj. We combine this information to interpret and 
compare students’ behaviors in the three different groups as 
shown in Table 1. 


As we can see from Table 1, all three clusters have a SA 
state (primarily focusing on SA actions). However, Cluster 
3 doesn’t have an IA state, while Cluster 2 doesn’t have sta- 
tes that balances efforts between IA & SC, and SC & SA. 
These balanced efforts are aimed to use acquired informa- 
tion or solution assessment results to support subsequent SC 
actions. Besides, only Cluster 1 maintains a good propor- 
tion of Search & Note actions which are considered to be 
more active as for acquiring information. Students in Clus- 
ter 1 and 3 did better in strategic state transitions, while 
for Cluster 2, self transitions dominated in all states. The 
performance measures of students in Cluster 1, i-e., Sg and 
Se. are the best among all three clusters. 


3.1 Reinforced Scaffolding Model Analysis 


The reinforced scaffolding model as described in section 2.3 
is aimed to capture useful behavioral and strategic evolu- 
tions. To validate it, we analyze the generated reinforced 
HMMs along with artificial action sequences that equal the 
sample size of original data set. The reinforced HMMs are 
shown in Figure 5. 


Compared to the original HMMs (Figure 3), the HMMs for 
the three clusters gradually converge to a isomorphic 3-state 
HMM structure. The differences between original and refi- 
ned HMMs can be summarized as (1) the HMMs tend to 
redistribute the efforts made between IA & SC, as well as 
SC & SA, e.g., the proportion of IA in hi is decreased for 
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Figure 5: Reinforced Scaffolding HMMs for the three clusters 


cluster 1 but it is increased for the other two clusters. Given 
the probability of IA supporting SC, Pia-se = 0.43 % 3:7 
according to statistics, the reinforced HMMs tend to have all 
SC actions to be supported by at least one IA action by con- 
verging emission probability of [A and SC towards a ratio 
of 70% : 30%. This is because the SC actions being sup- 
ported by IA actions have higher probability to be effective 
(the ratio for unsupported:supported mapedits to be cor- 
rect is 0.41 : 0.53); and (2) the usage frequency for actions, 
such as search, increase significantly, especially for clusters 
2 and 3. An explanation for this phenomena is that in the 
few cases that search appeared in the original data set, it 
is very likely followed by a read that supports a subsequent 
mapedit. The original HMM captures this pattern by having 
a hidden state hs, with relatively high emission probability 
for search, read and mapedit. When it expands to a node 
with search action during MCTS, the posterior probability 
for the hidden state to remain in h, is high and, therefore, 
further expansion can form this specific pattern and result 
in a higher chance of correct mapedit. Since the reward 
function is designed to optimize the causal map score, the 
reinforcement learning is likely to follow this pattern more 
frequently when generating artificial action sequences. 


4. DISCUSSION AND CONCLUSIONS 


In this paper, we proposed a novel reinforcement learning 
method for learner modeling, which integrated Hidden Mar- 
kov Model and Monte Carlo Tree Search within a Reinforce- 
ment learning framework to generate more accurate learner 
models for groups of students. We applied the HMM cluste- 
ring algorithm to divide students into groups based on their 
behaviors. Analysis and interpretation on these groups are 
presented to explain the clustering results. 


We then used data of student activities collected from a 
study with the Betty’s Brain OELE and generated reinfor- 
ced data sets along with the Reinforced scaffolding model. 
The experiments showed promising results according to our 
interpretation, where we were able to generate and inter- 
pret reinforced HMMs by analyzing evolvements of learning 
behaviors that can lead to better performance in building 
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causal maps. 


In future work, we will develop scaffolding methods to sup- 
port students’ learning new, more productive behaviors and 
strategies as they work on the system. And it will be of 
interest to study how our reinforcement learning method 
works with longitudinal studies on students and collect data 
across longer periods of time to generate dynamic coherence 
models. Besides, we will collect data from other learning 
environments, or even data from other domains to see how 
well our modeling methods perform. 
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